Skip to content

Add missing -race flag to build-race make target#86

Merged
Vervious merged 6 commits intoalgorand:masterfrom
Vervious:bin-race
Jun 26, 2019
Merged

Add missing -race flag to build-race make target#86
Vervious merged 6 commits intoalgorand:masterfrom
Vervious:bin-race

Conversation

@Vervious
Copy link
Copy Markdown
Contributor

@Vervious Vervious commented Jun 24, 2019

we were compiling without golang race detector where we intended the opposite.

It appears that we were compiling without golang race detector when we intended the opposite.
@Vervious Vervious requested a review from zeldovich June 24, 2019 19:50
@zeldovich
Copy link
Copy Markdown
Contributor

Thanks for catching this -- the -race flag got lost in some Makefile shuffling!

It looks like this now catches some races in kmd..

zeldovich
zeldovich previously approved these changes Jun 24, 2019
tsachiherman
tsachiherman previously approved these changes Jun 24, 2019
Copy link
Copy Markdown
Contributor

@tsachiherman tsachiherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is good, unfortunately, it fails during the unit tests. You might need to fix these before submitting this fix. ( if you feel that it goes beyond the scope of this task, please open a separate tickets for each individual failure )

@Vervious
Copy link
Copy Markdown
Contributor Author

(working on the races)

@zeldovich
Copy link
Copy Markdown
Contributor

If this is helpful -- I started looking at the failures that show up with this patch, and got as far as tracking at least one of the failures down to kmd's exit timeout, and some strange interaction with the race detection binaries.

Specifically, I get a quick reproducible failure on my laptop when I change defaultKMDTimeoutSecs = 1 in libgoal/libgoal.go and then running NODEBINDIR=$HOME/go/bin-race go test -v -run=TestAccountsCanSendMoney\$ ./test/e2e-go/features/transactions/

Vervious added 2 commits June 25, 2019 10:32
I accidnetally crashed my machine by failing too many e2e tests, and
leaving 20+ algods consuming too much of my cpu.
using value receivers constitutes a read, so we'd have to synchronize
method calls. so instead I think using pointer receivers is fine -
see line 226 (ws.shutdown = true) and line 262 (ws.writeStateFiles(addr))
where go -race complains
@Vervious Vervious dismissed stale reviews from tsachiherman and zeldovich via 7f946b2 June 25, 2019 15:24
@Vervious
Copy link
Copy Markdown
Contributor Author

Vervious commented Jun 25, 2019

(this might not be the only race - I haven't rerun the tests - but it's at least one race)

The other change is motivated by me accidentally crashing my machine running the e2e tests, since failures weren't cleaning up the dozens of algod processes that got spawned, which stole my cpu.

(as a whole, still wip)

@Vervious
Copy link
Copy Markdown
Contributor Author

Ok - this still doesn't fix the tests. (Also, thank you Nickolai for the reproducible tip! Very helpful so far). Still investigating.

@Vervious
Copy link
Copy Markdown
Contributor Author

Vervious commented Jun 25, 2019

It appears that doing something like

package main

import (
"fmt"
"golang.org/x/crypto/scrypt"
)

func main()  {
    password := []byte{10, 145, 184, 122, 252, 127, 236, 174, 244, 174, 64, 102, 14, 68, 244, 45, 43, 201, 124, 195, 192, 242, 69, 245, 67, 155, 71, 96, 15, 245, 243, 54}
    keySlice, err := scrypt.Key(password, []byte{}, 65536, 1, 32, 32)
    if err != nil {
        panic(err)
    }
    fmt.Println(keySlice)
}

with -race takes much longer than we expect. So this is why the failure with defaultKMDTimeoutSecs = 1 occurs - we kill kmd before it finishes encrypting the master encryption password.

I'm now looking at what happens when defaultKMDTimeoutSecs is larger

@zeldovich
Copy link
Copy Markdown
Contributor

Interesting. One possibility is to change the scrypt default params (like defaultScryptN in daemon/kmd/config/config.go) for // +build race to a lower value so that it doesn't take so long.

@Vervious
Copy link
Copy Markdown
Contributor Author

It's possible though I'm wondering if we should muck around with kmd's minScryptN parameter (in kmd/wallet/driver/sqlite_crypto.go). Or we can exclude -race from kmd entirely... none of the solutions that I can think of sound great.

(This is all assuming Scrypt is actually the issue for larger timeouts.)

@algoradam
Copy link
Copy Markdown
Contributor

I say exclude -race from kmd entirely, at least when run on travis (except maybe on builds that run overnight). Mucking around with the kmd codebase to let tests use different scrypt parameters sounds like complexity we don't want inside kmd. We want kmd code to have as little complexity as possible; if kmd race tests require nontrivial messing with kmd internals then they're doing more harm than good.

@Vervious
Copy link
Copy Markdown
Contributor Author

So locally, with timeout = 60 seconds, the same problem occurs. It looks like (in sqlite_crypto.go:81:deriveEncryptionkeyWithSalt

keySlice, err := scrypt.Key(password, salt[:], cfg.ScryptN, cfg.ScryptR, cfg.ScryptP, masterKeyLen)

takes at least a minute locally for me when running make integration

@zeldovich
Copy link
Copy Markdown
Contributor

Sure, let's exclude kmd from -race for now

Vervious added 2 commits June 26, 2019 10:01
And put it into bin-race to avoid messing with too much
of the testing flow. Apologies for complicating the makefile
even further.
@Vervious Vervious merged commit 7b15173 into algorand:master Jun 26, 2019
@Vervious Vervious deleted the bin-race branch June 26, 2019 16:10
algorandskiy pushed a commit to algorandskiy/go-algorand that referenced this pull request May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants